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Abstract 

, The problem of channel code design for the A/-ary input AWGN channel with additive 

■ Q-ary interference where the sequence of i.i.d. interference symbols is known causally at the 
encoder is considered. The code design criterion at high SNR is derived by defining a new 
distance measure between the input symbols of the Shannon's associated channel. For the case 

, of binary-input channel, i.e., M = 2, it is shown that it is sufficient to use only two (out of 

2*2) input symbols of the associated channel in the encoding as far as the distance spectrum 

■ of code is concerned. This reduces the problem of channel code design for the binary-input 
AWGN channel with known interference at the encoder to design of binary codes for the binary 
symmetric channel where the Hamming distance among codewords is the major factor in the 
performance of the code. 

Index Terms 

Causal side information, Shannon's associated channel, channel coding, pairwise error 
probability. 
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I. Introduction 

Information transmission over channels with known interference at the transmit- 
ter has recently found applications in various communication problems such as digital 
watermarking [1] and broadcast schemes [2]. A remarkable result on such channels 
was obtained by Costa, who showed that the capacity of the additive white Gaussian 
noise (AWGN) channel with additive Gaussian i.i.d. interference where the sequence of 
interference symbols is known non-causally at the transmitter is the same as the capacity 
of the AWGN channel [3]. Therefore, the Gaussian interference does not incur any loss in 
the capacity. This result was extended to arbitrary (random or deterministic) interference 
in [4] by using a precoding scheme based on multi-dimensional lattice quantization. 
Following Costa's "Writing on Dirty Paper" famous title [3], coding for the channel 
with non-causally known interference at the transmitter is referred to as "dirty paper 
coding" (DPC). By analogy, coding for the channel with causally-known interference at 
the transmitter is sometimes referred to as "dirty tape coding" (DTC). The result obtained 
by Costa does not hold for the case that the sequence of interference symbols is known 
causally at the transmitter. 

Recently, dirty paper coding has emerged as a building block in multiuser communi- 
cation. In particular, there has been considerable research studying the application of dirty 
paper coding to broadcast over multiple-input multiple-output (MIMO) channels. In such 
systems, for a given user, the signals sent to other users are considered as interference. 
Since all signals are known to the transmitter, successive "dirty paper" cancelation can be 
used in transmission after some linear preprocessing [2]. It was shown that DPC in fact 
achieves the sum capacity of the MIMO broadcast channel [5], [6], [7]. Most recently, 
it has been shown that the same is true for the entire capacity region of the MIMO 
broadcast channel [8]. 

These developments motivate finding realizable dirty paper coding techniques. Build- 
ing upon [4], Erez and ten Brink [9] proposed a practical code design based on vector 
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quantization via trellis shaping and using powerful channel codes. Due to the complexity 
of implementation, their scheme uses the knowledge of interference up to six future 
symbols rather than the whole interference sequence. Bennatan et al. [10] gave another 
design based on superposition coding and successive cancelation decoding. Their design 
uses a trellis coded quantizer with memory length nine and a low density parity check 
(LDPC) code as channel code. Wei Yu et al. [11] gave a design based on convolutional 
shaping and channel codes. 

The schemes that use the interference sequence up to the current symbol can be 
used as low-complexity solutions for the dirty paper problem. For example, in [1], scalar 
lattice quantization is proposed for data-hiding even though in that context, the host signal 
in clearly known non-causally. 

In this paper, we consider the problem of channel code design for the M-ary 
input AWGN channel with additive causally-known discrete interference. The discrete 
interference model is more appropriate for many practical applications. For example, 
in the MIMO broadcast channel where the transmitter uses a finite constellation, the 
interference caused by other users is discrete rather than continuous. 

Our design does not rely on the suboptimal (in terms of capacity) precoding scheme 
based on scalar lattice quantization for the dirty tape channel [4], [12]. Instead, we 
consider a new approach based on code design for the Shannon's associated channel 
over all possible input symbols. Another distinction between our work and the related 
research in the field is that we consider a finite channel input alphabet rather than a 
continuous one. 

This paper is organized as follows. In the next section, we summarize Shannon's 
work on channels with causal side information at the transmitter. In section [Till we 
introduce the channel model. In section [TV] we derive the code design criterion for the 
AWGN channel with causally-known discrete interference at the encoder. In section |Vj 
we consider channels with binary input for which we show that the design criterion 
derived in section |IV] reduces to maximizing the Hamming distance. In section |VT1 we 
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consider a special case for which the result for the binary channel also holds for the 
M-ary channel. In section IVIIl we consider a more general channel model for which the 
main results of this work hold. We conclude this paper in section IVIII1 

II. Channels with Side Information at the Transmitter 

Channels with known interference at the transmitter are special case of channels 
with side information at the transmitter which were considered by Shannon [13] in the 
causal knowledge setting and by Gel'fand and Pinsker [14] in the non-causal knowledge 
setting. 

Shannon considered a discrete memoryless channel (DMC) whose transition matrix 
depends on the channel state. A state-dependent discrete memoryless channel (SD-DMC) 
is defined by a finite input alphabet X, a finite output alphabet y, and transition prob- 
abilities p(y\x,s), where the state s takes on values in a finite alphabet S. The block 
diagram of a state-dependent channel with state information at the encoder is shown in 

fig.m 

In the causal knowledge setting, the encoder maps a message w into X n as 

Si = fi(w,si,...,Si) , l<i<n. (1) 

Shannon showed that it is sufficient to consider the coding schemes that use only 
the current state symbol in the encoding process to achieve the capacity of an SD-DMC 
with i.i.d. state sequence known causally at the encoder [13]. 

The SD-DMC can be used in the way shown in fig. [2] to transmit information. A 
precoder is added in front of the SD-DMC. A message w is mapped into T n , where T 
is a new alphabet. The output of the precoder ranges over X and depends on the current 
interference symbol. The regular (without state) channel from T to Y is defined by the 
transition probabilities 

<l(y\t) = ^2p(s)p(y\x = t(s),s), (2) 
ses 
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Fig. 1. SD-DMC with state information at the encoder. 
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Fig. 2. The associated regular DMC. 

where p(s) is the probability of the state s. The DMC defined in © is called the 
associated channel. The codes for the associated channel describe the codes for the 
SD-DMC that use only the current state symbols in the encoding operation. In order to 
describe all coding schemes for the SD-DMC that use only the current state symbol in 
the encoding process, T must include all functions from the state alphabet to the input 
alphabet of the state-dependent channel. There are a total of l^l' 5 ' of such functions, 
where |.| denotes the cardinality of a set. Any of the functions can be represented by a 
|5|-tuple (xi,x 2 , . . . ,x\s\) composed of elements of X, implying that the value of the 
function at state s is x s , s — 1, 2, . . . , |<S|. 

III. The Channel Model 
We consider data transmission over the channel 



Y = X + S + N, 



(3) 
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where X is the channel input, which takes on values in a real finite set X, Y is the 
channel output, N is additive white Gaussian noise with power a 2 , and the interference 
S is a discrete random variable that takes on values in a real finite set S. The sequence 
of i.i.d. interference symbols is known causally at the encoder. 

The above channel can be considered as a special case of the state-dependent 
channel considered by Shannon with one exception, that the channel output alphabet 
is continuous. In our case, the likelihood function f Y \x,s(u\x, s) is used instead of the 
transition probabilities. We denote the input to the associated channel by T, which can 
be considered as a function from S to X. We denote the cardinality of X and S by 
M and Q, respectively. Then the cardinality of T will be M Q , which is the number all 
functions from S to X. 

The likelihood function for the associated channel is given by 

fY\r(y\t) = 5^p(s)/y| X) s(y|*(s),s) 
se<s 

= X>00/*(v-*(a)-a), (4) 
ses 

where p(s) is the probability of the interference symbol s and /n denotes the pdf of the 
Gaussian noise N. 

Although in this work, we consider a fixed channel input alphabet X, the transmitted 
power is not fixed in general. In fact, for probability distribution p(s) on S and for a 
given coding scheme for the associated channel which induces probability distribution 
p(t) on the symbols of T, the transmitted power is given by 

E l x2 ] = ££p(fM*)£[x 2 M 

teT ses 
teT ses 

Thus, in general, the transmitted power depends on the probability distribution on the 
interference alphabet. The binary-input channel with X = {— x,x} is an exception, 
however, for which we have t 2 (s) = x 2 for all s E S. Therefore, for any coding scheme 
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and any probability distribution on the interference alphabet, the transmitted power is 
equal to x 2 . 

In this work, we do not impose any constraint on the power of the transmitted 
signal. However, in the performance comparisons given in sections |V] and [VI] for different 
scenarios, we ensure that the transmitted power is the same in all scenarios. 

IV. The Code Design Criterion 

Any coding scheme for the associated channel defined by © translates to a coding 
scheme for the actual channel defined by fy\x,s(y\ x ^ s )- We use the pairwise error 
probability (PEP) approach to derive the code design criterion at high SNR. Since in 
this work, we consider fixed channel input and interference alphabets, the high SNR 
scenario is realized by making the noise power a 2 sufficiently small. This is equivalent 
to scale up the transmitted signal and the interference by the same factor for a given 
noise power. 

Suppose that the messages W\ and w 2 are encoded into codewords £" = t\t 2 ■ ■ ■ t n 
and r™ = r\r 2 . . . r„, respectively, where ti and r« belong to the alphabet T, % = 1, . . . , n. 
In the absence of noise, transmission of the codeword t r [ can result in many different 
received sequences at the channel output depending on the interference sequence = 
s\s 2 . . . s n . In specific, all sequences in {(ti(si) + si,t 2 {s 2 ) + s 2 , . . . ,t n (s n ) + s„) : s™ G 
S n } represent the transmitted codeword at the channel output. On the other hand, all 
sequences in {(rx(si) + Si, r 2 (s 2 ) + s 2 , . . . , r n (s n ) + s n ) : s™ G S n } represent the codeword 
r™. Using maximum likelihood decoding, the probability of the event that message w 2 is 
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decoded given message w\ was sent is given by 

Pr{u)i — >• w 2 \wi} = y^j?(si)Pr{w x — > w 2 |wi, s?} 

= X>(a?)Pr{/y|r(l£|f?) < /y|r(^|r?)|ti;i,a?} 



J>(a?)Pr \t[fY\T(Vi\ti) <Hfr\ T (y 



i=l i=l 



$>K)Pr <j - - s ) ^ 

,i=i se5 



n 



i=l seS J 
In appendix HI we have shown that the above error probability at high SNR is given by 

Pr{ Wl -+ W2 \ Wl} = O ( Q , (7) 



Q(x)= I -Lexpf-^ldy, (8) 



where 

POO 

O(x) = I .,. , 

/2i V 2 

and iisi(i, r) (SI stands for side information), the distance between two input symbols of 
the associated channel t and r, is defined as 

ds\(t,r) — min \t(si) + s x — r(s 2 ) - s 2 \. (9) 

According to ©, at high SNR, the code design criterion is to maximize the minimum 
distance between the codewords with the distance measure defined in ©. 

A. No Side Information at the Encoder - A Comparison 

In order to see how the knowledge of interference at the encoder can result in 
larger distances between codewords, consider the channel model introduced in section 
HITl with the exception that the interference sequence is not known at the encoder. In this 
case, the discrete interference is considered as noise. In order to obtain the PEP for this 
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channel, suppose that messages vi and v 2 are encoded into = x± ■ ■ ■ x n G <Y n and 
= Z\ ■ ■ ■ z n E X n , respectively. Similarly, it can be shown that the PEP at high SNR 
is given by 

Pr{B1 _ V2W} = o (« y do, 

where 2), the distance between two symbols x and z of A" is defined as 

d(x, z) = min \x + s\ — z — s 2 \. (11) 

81,8265 

Comparing © and (fTTI) . it becomes clear that larger distances among codewords are 
possible for the channel with side information at the encoder. In fact, the distance d(x, z) 
is equal to ds\(t, r) for t = (x, . . . , x) and r = (z, . . . , z). However, T has many other 
symbols, which may yield larger distances. For example, consider the channel with X = 
S = { — 1, +1}. For the case without side information at the encoder, we can compute the 
distances between symbols of X according to (fTTT) as d(l, 1) = d(— 1, —1) = d(l, —1) = 
0. Hence, according to (flOl) . it is impossible to transmit data over this channel with low 
error probability even at high SNR. For the case with side information at the encoder, 
the four symbols of the associated channel can be represented as u\ = (— 1,+1),«2 = 
(+1, — l),Us = (+1, +1), «4 = (—1, —1). Using ©, it is easy to check that the distances 
between all pairs of the symbols are zero except for ds\(u 1 ,u 2 ) which is 2. As will be 
seen in section |Vj u\ and u 2 can be used in the encoding to achieve arbitrarily low error 
probabilities as SNR increases. 

It is worth mentioning that the distance measures defined in © or (fTD) do not satisfy 
the triangle inequality. For example, again consider the channel with X = S = { — 1, +1}. 
The distances between all pairs of the input symbols of the associated channel are zero 
except for ds\(ui,u 2 ) which is 2. Therefore, the triangle inequality does not hold for 
ds\(ui,u 3 ), d S \(u 3 ,u 2 ), and ds\{u 1 ,u 2 ). 
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V. The Binary Channel 

We call the channel introduced in © a binary channel when the channel accepts 
binary input, i.e., M — 2. There is no constraints on the cardinality of the interference 
alphabet. For the binary channel, the size of T is 2 Q . However, we may not need to 
use all the symbols of the alphabet in the encoding. In this section, we show that it is 
sufficient to use only two symbols of T in the encoding as far as the distance spectrum 
of the code is concerned. We begin with the following lemma for the binary channel. 

Lemma 1: For the binary channel, there exist at least two symbols in T with nonzero 
distance. 

Proof: We may explicitly denote the channel input and interference alphabets by 
X = {xi, x 2 } and S = {s 1; . . . , sq}, where X\ < x 2 and s\ < s 2 < ■ ■ ■ < sq. From the 
definition of distance in ©, it is sufficient to show that there exist two elements t and 
r in T such that the corresponding multi-sets Q (of size Q) {t(si) + si, . . . , £(sq) + sq} 
and {r(si) + Si, . . . , t(sq) + sq} are disjoint. We prove this by induction on Q. 

The statement of the lemma holds for Q — 1 since we may take t = (xi) and 
r = (x 2 ). Then the sets {xi + si} and {x 2 + s\} are disjoint. Now suppose that the 
statement of the lemma is true for some Q. Therefore, the exist two Q-tuples composed of 
elements of X (two input symbols of the associated channel) such that the corresponding 
multi-sets are disjoint. We prove that the statement of the lemma hold for Q + 1. 

The element x 2 + sq + i is larger than any element of the two multi-sets (of size Q). 
Hence, it does not belong to any of the multi-sets. If X\ + sq + i does not belong to any of 
the multi-sets too, then we can include the new elements x\ + sq + i and x 2 + sq + i in the 
multi-sets of size Q arbitrarily (one elements in each multi-set). The resulting multi-sets 
of size Q + 1 will be disjoint. If x\ + sq + i belongs to one of the multi-set of size Q, 
we include it in that multi-set and include x 2 + sq + i in the other multi-set to form the 
new disjoint multi-sets of size Q + 1. The two (Q + l)-tuples (the two input symbols 

'A multi-set differs from a set in that each member may have a multiplicity greater than one. For example, {1, 3, 3, 7} 
is a multi-set of size four where 3 has multiplicity two. 
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of the associated channel) are then obtained from the two multi-sets of size Q + 1 by 
subtracting the interference symbols from their elements. ■ 
Lemma [His in fact a special case of theorem 2 in [15], which was stated in the 
context of capacity. 

Let ui and u 2 be two input symbols of the associated channel with the maximum 
distance among all pairs of input symbols of the associated channel. Since ds\(ui,u 2 ) > 
(according to Lemma Q]), we have Ui(s) ^ u 2 (s),Vs G S, otherwise, from ©, 
ds\{ui, u 2 ) = 0. We choose an arbitrary interference symbol s G S to partition T as 
follows. We put t G T in T\ if t(s) = U\(s), otherwise (i.e., t(s) = u 2 (s)) we put t in 
T 2 . Note that the distance between any two symbols in Tj is zero, j = 1,2. 

Suppose that a codebook is designed for the binary channel with codewords com- 
posed of elements of T. We construct a new codebook from the original one by replacing 
the elements of the codewords that belong to T\ by u\ and replacing the elements of 
the codewords that belong to T 2 by u 2 . Since the codewords of the new codebook are 
composed of just two elements, we may call the new code a binary code. 

Theorem 1: The distance spectrum of the binary code constructed by the procedure 
described above is at least as good as the distance spectrum of the original code. 

Proof: Consider any two codewords (t%, . . . , t n ) and (7*1, ... , r n ) from the original 
codebook, where t^r* G T. The squared distance between the two codewords is equal 
to Y^i=i ^ii(^i) r i)- F° r an y ^ G {1, 2, . . . , n}, we consider two cases: 

Case 1: U and belong to the same partition. Then ds\(U, = 0, so the replace- 
ment will not change the distance. 

Case 2: t { and belong to different partitions. Then since rfsi(^ 7 r i) < d.Q\(ui,u 2 ), 
the replacement will not decrease the distance. ■ 

According to theorem [Q as far as the distance spectrum of the code in concerned, 
it is sufficient to use two symbols of T with the maximum distance, namely u\ and 
u 2 , in the encoding for a binary channel. Since T has size 2^ for the binary channel, 
a brute-force search for finding two symbols in T with the maximum distance will 
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have exponential complexity with respect to Q. We have proposed an algorithm with 
polynomial complexity for finding two symbols with the maximum distance in appendix 

M 

Since it is sufficient to use u\ and u 2 in the encoding for the binary channel, we 
can define the Hamming distance between any two codewords, which is the number 
of positions at which the two codewords are different. Consider two codewords c\ = 
(ti, . . . , t n ) and c 2 = (ri, . . . , r n ) with elements from the binary set {ui, u 2 }. The squared 
distance between these codewords is given by 

n 

y~yil(^> r ») = d%\{u 1 ,u 2 )d H {ci,c 2 ), (12) 
i=i 

where dn{ci, c 2 ) is the Hamming distance between c\ and c 2 . Therefore, the problem of 
designing codes for the binary channel where the interference sequence is known causally 
at the encoder reduces to the design of codes for the binary symmetric channel. The only 
difference is that the coding is over the set {u±, u 2 } rather than {0, 1}. 

A. Comparison with the Interference-Free Channel 

If we were to use a binary code for the interference-free binary channel with the 
input alphabet X = {xi,x 2 }, then the Euclidean distance between any two codewords 
ci and c 2 of length n for the interference-free channel would be 

d|(ci, c 2 ) = Oi - x 2 ) 2 d H (ci, c 2 ), (13) 

where d E denotes the Euclidean distance. 

Using (fT2)) and (fT3l . we can compare the performance of a zero-one binary code for 
the binary channel with causal side information at the encoder with the same zero-one 
binary code for the interference-free binary channel. In the case of channel with side 
information, zero and one are mapped to u\ and u 2 , and in the case of the interference- 
free channel, zero and one are mapped to x% and x 2 , respectively. Note that u\ and u 2 are 
functions from the interference alphabet S to the channel input alphabet X = {xi,x 2 }. 
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It is clear from © that 

ds\(ui,u 2 ) < \xx - x 2 \. (14) 

Therefore, using (fT2)) and (fT3l) . the distance spectrum of the code for the interference- 
free channel is at least as good as the distance- spectrum of the code for the channel 
with known interference at the encoder. Of course, this is not surprising. However, it is 
interesting to search for the conditions that (fl4"l) is satisfied with equality. 

If (fT4l) is satisfied with equality, the distance spectrum of the two codes will be the 
same. In other words, if CHI) is satisfied with equality, the knowledge of interference at 
the encoder enables us to achieve the same performance (in terms of order of probability 
of error) as the interference-free case at high SNR. 

We may explicitly denote the interference alphabet by S = {sx,...,sq}, where 
si < s 2 < ■ ■ ■ < sq. Then the following theorem holds. 

Theorem 2: ds,\(ui,u 2 ) = \%i — x 2 \ if and only if 

min \si — Sj\ > \x\ — x 2 \. 
Proof: If min |sj — Sj\ > \xi — x 2 \, we may take u\ = (xi, x 2 , Xi, . . .) and u 2 = 
(x 2 , x%, x 2 , . . .). Then we have 

ds\(ui,u 2 ) = Tmn\ui(si) + Si-U2(sj)-Sj\ 
hi 

= min{|xi + s k - x 2 - s k \, \x x + s 2 fc 1+ i - x 2 - s 2k2+ i\ kl ^ k2 

pi + S2fei+1 — X\ — S2fc 2 |fc 1 ,fc 2 5 l X 2 + S 2kl — X 2 — S 2 fc 2 +1 1 A:i,fc 2 } 

= min {\xi — x 2 |) \X\ + s 2kl+ \ — x 2 — s 2k2+ i\ klf L k2 , |s2fc!+i — s 2 k 2 \ kl ,k 2 } ■ 

(15) 

We also have 

1^1 + S 2kl+ i — X 2 — S 2k2+ \\ > 152/id+l — S2fc 2 +1 1 — \X\ — X 2 \ 

> 2 min |s» — Sj\ — \xi — x 2 \ for k± ^ k 2 

> \xi - x 2 \ (16) 
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and 

|s 2 fei+i - s 2 k 2 \ > min | Si -Sj | Vfci,fc 2 

> \x\ - x 2 \. (17) 

Therefore, ds\(ui,u 2 ) = \%i — x 2 \. 

For the other direction, suppose that min |s» — Sj\ < \x± — x 2 \. We will show that 
ds\(ui,u 2 ) < \x\ — x 2 \. Suppose that s k , s fc+1 G S achieve the minimum of |sj — Sj\ and 
ti and t 2 are arbitrary elements of T. We consider two non-trivial cases: 

Case 1: ti(s fc ) = ti(s k+1 ) = x x and t 2 (s k ) = t 2 (s k+1 ) = x 2 . Then d S i(ti,* 2 ) < 
+ s k+i ~ t 2 {s k ) - s k \ < \xi - x 2 \. 

Case 2: t^Sf.) = x 1 ,t 1 (s k+1 ) = x 2 andt 2 (s k ) = x 2 ,t 2 (s k+1 ) = x x . Thend S |(t l5 i 2 ) < 
\ti(s k ) + s k - t 2 (s k+ i) - s k+ i\ < \xi - x 2 \. U 

As an example, consider a binary channel with X = S = { — 1, +1} and equiprob- 
able interference symbols. The two symbols with the maximum distance in the input 
alphabet of the associated channel are u\ = (—1, +1), u 2 = (+1, —1). We have simulated 
the error probability performance of the above uncoded system with maximum likelihood 
decoding. The error probability vs. SNR (= -\) for the above channel is plotted in fig. 
|3j The error probability curve for the interference-free channel with X = { — 1,-1-1} is 
plotted for comparison. For the interference-free channel, P e = Q{~)- It is easy to check 
that in this example, ^51(^1,^2) = |#i — x 2 \ = 2. As it can be seen, the error probability 
curves decay at the same rate with increasing SNR as expected. The error probability 
curve for the scenario that the interference is not known at the encoder, is plotted for 
comparison. In this scenario, the error probability curve reaches an error floor of |. 

Another example is illustrated in fig. SI For this example, X = {— 1,+1},<S = 
{ — 1,0,-1-1 }. We can find by inspection two symbols of the associated channel input 
alphabet with the maximum distance as u\ = (— 1, — 1, +1), u 2 = (+1, +1, — 1). Here, 
we have ds\(ui,u 2 ) = 1 < \x% — x 2 \ = 2. Therefore, the error probability curve for 
the channel with known interference at the encoder does not decay as fast as the error 
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Fig. 3. Error probability vs. SNR for the binary input AWGN channel with/without known/unknown interference. 

X = S = {-1,+1}. 

probability curve for the interference-free channel. For the scenario that the interference 
is not known at the encoder, the error probability curve reaches an error floor of |. 

VI. The M-ary Channel 

In general, the statement of theorem \T\ is not extendable to the case with M > 2 
channel input symbols. In fact, by using more than M input symbols of the associated 
channel, we can obtain a better codebook in terms of distance spectrum than any other 
codebook composed of just M input symbols of the associated channel. An example 
showing this is given in appendix [TTTJ However, under some condition on the channel 
input and interference alphabets, the statement of theorem Q] can be generalized to the 
case with M > 2. 

Theorem 3: As far as the distance spectrum of code is concerned, it is sufficient to 
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Fig. 4. Error probability vs. SNR for the binary input AWGN channel with/without known/unknown interference. 
* = {-l,+l},«S = {-l,0,+l}. 

use M (out of M®) input symbols of the associated channel in the encoding if 

min \si — Sj\ > 2 max \x{ — Xj\. 
Proof: Consider the M input symbols of the associated channel u\ = (xi, . . . , x\), 
U2 = (x2, ■ ■ ■ 1X2), • • ., um = (xm, ■ ■ ■ , xm)- We use these symbols to partition the 
associated channel input alphabet T as follows. Put t E T in % if the first element of 
t is Xi, % — 1, 2, . . . , M. Note that % has size M Q ~ l and the distance between any two 
symbols in % is zero, i — 1, 2, . . . , M. For any p,q— 1, . . . , M, we have 

ds\{u P ,u q ) = min \x p + s kl - x q - s k2 \ 

ki,k 2 

= mm{\x p -x q \,\x p + s kl -x q -s k2 \ kl ^ k2 }. (18) 
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We also have 

\Xp ~\~ Ski 1 — \^ki ^fez I \Xp Xq\ 

> 2 max \xi — Xj\ — \x p — x q \ for k\ ^ &2 

> \x p -x q \, (19) 

Therefore, ds\( | . Note that the distance between any two symbols from 

T p and T q is at most \x p — x q \ = ds\(u p , u q ). 

Suppose that a codebook is designed with codewords composed of possibly all 
elements of T. We construct a new codebook from the original one by replacing the 
elements of the codewords that belong to % by Ui, i — 1, 2, . . . , M. It is easy to check 
that the distance spectrum of the new code is at least as good as the distance spectrum 
of the original code. ■ 

According to theorem [3l it is sufficient to use only the symbols m, . . . , um in the 
encoding. But any of these symbols is a constant function from S to X. Therefore, 
the same symbol enters the channel regardless of the current interference symbol. This 
suggests that the knowledge of interference symbols at the encoder is not helpful in terms 
of distance spectrum improvement provided that the condition of theorem |3] is satisfied. 
In fact, with the condition of theorem [3l we have 

ds\(ui,Uj) = d{x h Xj) = d E (xi,Xj), i,j = l,...,M. (20) 

where d(., .), defined in CCD, is the distance measure when the interference is not known at 
the encoder and ds(-, •) is the Euclidean distance measure. Therefore, the error probability 
performance of a code for the channel with known/unknown interference at the encoder 
will be the same as the performance of the same code for the interference-free channel 
at high SNR. 

It is worth mentioning that for the above-mentioned three scenarios the codes for 
the interference-free channel, the channel with known interference at the encoder, and 
the channel with unknown interference use the same transmitted power. 
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VII. A More General Channel Model 

Although we have considered the AWGN channel with additive interference so far, 
our treatment applies to more general channels characterized by 

Y = f(X,S) + N, (21) 

where / is an arbitrary function of two variables, S is the channel state which is known 
causally at the encoder, X is the channel input, and N is white Gaussian noise. Another 
special case of this more general channel is the fast fading channel 

Y = SX + N, (22) 

where S is the fading coefficient. For the general channel model (I2TT) . the distance between 
two symbols t and r of T is defined as 

<fa(t,r)= min \f(t{s 1 ),s 1 )-f{t{8 2 ),s 2 )\. (23) 

si,S2&S 

TheoremCQon the binary channel also holds for the general channel model. However, 
the maximum distance among pairs of symbols of T may be zero; i.e., lemma Q] does 
not hold true in general. Theorems [2] and [3] do not hold for the more general channel 
model in (T2T)) and are specific to the AWGN with additive interference channel model. 

VIII. Conclusion 

In this paper, we derived the code design criterion at high SNR for the M-ary input 
AWGN channel with additive Q-level interference, where the sequence of interference 
symbols is known causally at the encoder. The code design is over an input alphabet 
T of size M Q . The performance of a code for our channel at high SNR is governed 
by the minimum distance between the codewords with elements from T. We may not 
need to use all symbols of T in the encoding. In particular, we showed that for the case 
M = 2, as far as the distance spectrum of the code is concerned, we just need to use 
two symbols of T with the maximum distance among all pairs of symbols. This reduces 
the code design problem for our channel to code design for binary symmetric channel 
which has been well researched in the literature. 
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Appendix I 

Derivation of Code Design Criterion at high SNR 

Define 

Ai = {U(s) + s : s G S}, i = l,...,n, (24) 

&i = + s : s G 5}, i = l,...,n. (25) 

It is worth mentioning that the cardinality of Ai (or £>j) can be less than Q, i — 1, . . . , n, 
since different interference symbols may yield the same element in Ai (or Bi). For any 
i — 1, . . . , n, we have 



^2p(s)fN(y -ti(s) - s) = ^2p(a)f N (y - a), (26) 

seS a£Ai 

Z>00Mv- = 5>(&)/*(y-&), (27) 
where p(a) and are obtained from p(s) according to 

s£S:ti(s)+s=a 

P (b) = Yl p^- ( 29) 

s£S:ri(s)+s=b 

For any sequence a™ = a-i ■ ■ ■ a n e Ai x • • • x An and 6" = &i • • • b n e B\ x • • • x £> n , 
we define the events 

^i(«i) = Pi ( °>i = argmin \y { - a\ ) , (30) 

i=i N ' 

^2(6?) = = argmin (31) 

i=l ^ ' 

given that w\ has been sent and the interference sequence has occurred. The event 
i?i(a™) simply means that aj is the closest point to the received signal yi (given w 1 has 
been sent and the interference sequence s™ has occurred) among all points of Ai for all 
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Any term in the error probability in © can be written as 



pr n e p(°)/^(w - a) < n e*w^ - & )k> s " 

l i=l aeA «=1 6eBi 

f n il 

e e pr i n e pmmw - a ) ^ n e k&)m^ - fo )> ^«)> 



=1 aeA: 



i=i befii 



/ 



v 



( 



i=l 



fwiVi ~ a) 
fniVi - at] 

\ 



\ 



J 



fniVi - b) 



\ 



^E 1 (a n l ),E 2 (b n 1 )\w 1 ,s n l 



J 



> J2(Vi - h *f + K°*> E 1 (a n 1 ),E 2 (b n 1 )\w 1 , a? 
i=i 

where K = K(y™, a™, 6") is given by 



a?, 1 6™ 



(32) 



fNiVi—Oi) 



(33) 



Given the events and E 2 (b™), it is easy to check that K(y™, a™, 6") is bounded as 

= 2^1 gp(a 4 ) < a?, 6?) < # 2 (l?) = 2^ log-— . (34) 

i=l i=l ^ ' 

As we consider the high SNR regime, we may assume that the noise power is sufficiently 
small so that the error probability © can be well approximated by 

( n n 

E^( s i) E E Pr E^ - a *) 2 ^ X> - #iK), £ 2 (&?)K, a? [ • (35) 



i=i 



i=i 
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Any term in the summation (1351) can be upper bounded as 

( n n 

Pr \ ^T^-a,) 2 > Y,^-b i ) 2 ,E 1 (a n 1 ),E 2 (b n 1 )\w 1 ,s n 1 



t=l i=l 



< Pr ^(^-c,) 2 > £(ife-fc)^(a?),^(6?)|u*,arj 

I i=l i=l 

{n n 
1=1 i= 

f \/Er=i ic 4 - ^i 2 



2cr 



< ( H vS3Z5^ii J . ( 36) 



2a 



where 



ti(si) + Si, i = l,...,n. (37) 



The first inequality is due to the fact that given E^a™), we have \yi — < \y i — Ci\,i = 
l,...,n. 

In the following, we show that the upper bound (l36l) is tight for the term(s) in the 
summation (1331) satisfying 

{cii, bi} = argmin \a — b\, i = l,...,n, (38) 

and 

di = Ci, i = l,...,n. (39) 

Any term in (1351) equals the integral of the joint probability distribution of y r { = 
Vi • • 'Un (given w\, s") over the region in the n-dimensional Euclidean space defined by 

ra n ^ 

y? : - a,) 2 > Y,(y i - bi) 2 , E 1 (a n l ),E 2 (b n l ) . (40) 

i=l i=\ ) 

This region is illustrated by the shaded area ABCD in fig. \5\ for n = 2. The 
horizontal and vertical boundaries of ABCD correspond to the events E 1 (al) and E 2 (b\). 
The elements of Ai and Bi are shown by o and x, respectively. The other boundary 
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!J2 



o 



a 2 



o- 



X 



-X" 



\ A 




D 



-e- 



B 



C 



-e- 



Fig. 5. Illustrating the regions of integration for dimension n — 2. 



of ABCD which corresponds to Y^i=i(Vi ~ a i) — J2i=i(Vi ~ h) 1S tne perpendicular 
bisector of the line segment connecting a\ to b\. We may consider an n-cube inside this 
region with sides equal to some 5 > as shown in fig. [5] and perform the integration 
over this smaller region to obtain a lower bound for the term(s) in the summation (1351) 
satisfying <(38]> and d3~9i 
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In summary, for the terms in (1351) which satisfy (1381) and (1391 ), we have 

( n n 

Pr - a t ) 2 > £( W - E 1 (a n 1 ),E 2 (b n 1 )\w u a? 



> 




2a 



i=i 

~1 n-1 r 



2(7 

V / Er=i 4l(*i> r i) 



2a / " V 2a 
as a — > 



(41) 



2a 

where the right hand side of the inequality in (|41"I) equals the integral of the joint 
probability distribution of = y\ ■ • • y n (given wi, s") over the smaller region, which is 
obtained by using the fact that is Gaussian centered at c" = a™ and by applying the 
necessary rotation. 

Appendix II 

A POLYNOMIAL COMPLEXITY ALGORITHM FOR FINDING TWO SYMBOLS OF T WITH 

THE MAXIMUM DISTANCE 

We propose an algorithm for finding two symbols of T with distance greater than or 
equal to some d > 0. Then we explain how to find two symbols in T with the maximum 
distance. Consider the bipartite graph G(U, V, E) shown in fig. |6] with 2Q vertices at each 
part. Each of the non-intersecting sets Ui, • • • ,Uq contains two vertices of the upper part 
U and each of the nonintersecting sets V\ , • • • ,Vq contains two vertices of the lower 
part V. The vertices of the sets Ui = {uii,u i2 } and Vi = {vii,v i2 } are labeled by the 
elements of the set X + Sj = {x\ + Sj, x 2 + Sj}, i = 1, . . . , Q. A vertex in Ui is connected 
to a vertex in Vj if the absolute value of the difference of their labels is greater than or 
equal to d , i,j — l,...,Q. 

From the definition of distance in ©, there exist two symbols in T with distance 
d > d if and only if G has a complete bipartite subgraph Kq q with exactly one vertex 
in each Ui and each Vj. If such a subgraph exists, we label the edges of the subgraph 




by 1 and we label the rest of the edges of G by 0. We denote the label of edge e by 
y e G {0, 1}. Such a labeling satisfies the following set of constraints 

Ve = Q, i = l,...,Q, (42) 

Ve = Q, i = l,...,Q, (43) 
y e e {0,1}. (44) 

Note that by definition, an edge of a graph is a set of two vertices. Therefore, the notation 
e fl Ui in (1421) is meaningful. The equations (1421) and (1431) state that the sum of the labels 
of the edges going out of any Ui and Vi is Q. 

We devise an objective function for the constraints (142]) . (|43T) . and (|44|) such that the 
objective function takes a given maximum value only for a labeling with label 1 for the 
edges of the subgraph Kq q and label for the rest of the edges. Consider the following 
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optimization problem 



, 2 

Q 2 / \ Q 2 



m ^ x EE E ^ +EE E 

subject to 

E Ve = Q' i = l,-..,Q, 

e-.enUi^tj) 

E & = Q, i=l,...,Q, 

y e G{0,l}. (45) 

In the following, we find the maximum of the above optimization problem for the 
foregoing labeling. Given the constraints of (l45l) . we have 

e(e^) = E y- = Q> i = i,...,Q, (46) 

j=l \e:v,ij£e J e:eC\Ui^<f> 

e(e^) = E y° = Q> i = h---,Q- (47) 

j=l \e-.VijEe J e:enVi^cf> 

If the sum of two nonnegative variables is constant, then the sum of their squares takes its 
maximum if one of the variables is zero. Therefore, for any i = 1, . . . , Q, the maximum 
of 

j=l \e:Uij£e 

and 2 

j=l \e:Vij£e 

will be Q 2 and this maximum occurs if and only if one vertex in any of U%, . . . , Uq 
and V\, . . . , Vq is connected to Q edges with label 1 and the other vertex in any of 
Ui, . . . , Uq and Vi, . . . , Vq is not connected to any edge with label 1. This is equivalent 
to the existence of a subgraph Kq^q. Then the maximum of the objective function in 
(US will be Q x Q 2 + Q x Q 2 = 2Q\ 
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We may relax the integrality constraint (|44l) and change equality signs in (|42l) and 
431) to inequality signs to obtain the following optimization program 

Q 2 / \ 2 Q 2 / 

max £ S ^ + £ [ J2 y- 



it 



Ve 

subject to 



i=l j=l \e:Uijde I i=l j=l \e:Vij£e 

^ < <2, i=l,-..,Q, 

5^ Ve<Q, i = l,...,Q, 

e-.enVi^tj) 

< y e < 1. (48) 

Using the same argument as in the previous paragraph, the value 2Q 3 is also achievable 
for the above maximization problem if and only if a subgraph Kq q of the graph G 
exists. The above optimization problem is a quadratic programming problem [16] with 
convex objective function and can be solved in polynomial time [17] in terms of the 
number of edges of G, which is at most 4Q 2 . 

In summary, we turned the problem of finding two symbols in T with distance at 
least d > into the quadratic programming problem (|48l) . If the maximum value of (1481) 
is 2Q 3 , then two such symbols are obtained from the optimal solution of (1481) . Otherwise, 
two such symbols do not exist. 

To find two symbols in T with the maximum distance, we need to run the described 
algorithm for a few values for do- We can obtain an upper bound on the number of 
possible distances between symbols of T. From the definition of distance in ©, a loose 
upper bound is M 2 Q 2 = AQ 2 . By using the binary search algorithm [18], the search over 
possible distances can be done with logarithmic complexity with respect to the number 
of possible distances. 

It is worth mentioning that our proposed algorithm can be extended to find K > 2 
symbols of T with the maximum minimum distance among K symbols for the general 
case M > 2. 
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Appendix III 

An example that shows using more than M symbols of T results in 

LARGER MINIMUM DISTANCE (M > 2) 

Consider the channel with X = {1, 4, 5, 7} and S = {0, 4}. Consider the following 
codebook with six codewords of length two that uses seven symbols of the associated 
channel. 



Codeword 1 : 


((4,1), (5,1)) 


Codeword 2 : 


((4,1), (1,5)) 


Codeword 3 : 


((5,4), (5,4)) 


Codeword 4 : 


((5,4), (4,5)) 


Codeword 5 : 


((1,5), (4,1)) 


Codeword 6 : 


((1,5), (1,4)) 



The minimum distance of the above code is 3. However, it can be verified by a computer 
program that any code for this channel with codebook size six and length two that uses 
any four symbols of the associated channel yields a minimum distance less than 3. 
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